perf: speed up LP file writing (2.5-3.9x on large models, no regressions on small) #564

FBumann · 2026-01-31T21:13:58Z

Changes proposed in this Pull Request

Speed up LP file writing by up to 3.9x on large models, with consistent improvements across all problem sizes. Includes a benchmark script for reproducibility.
Added a lp-file benchmarking script, which is a majority of the lines changed

Performance optimizations

Use Polars streaming engine for concat_str + write_csv via new _format_and_write() helper (with automatic fallback + warning)
Replace concat + sort with join for constraint assembly
Extract maybe_group_terms_polars() to skip expensive group_by when terms already reference distinct variables
Reduce per-constraint overhead by applying labels mask directly and using fast sign uniformity check

Bug fixes

Fix missing space in LP file output
Fix IndexError on empty constraint slices in sign_flat check

Benchmark results

Reproduce with python dev-scripts/benchmark_lp_writer.py --model basic -o results.json --label "my run".

Synthetic model (2×N² vars, 2×N² constraints)

No regressions on small models, speedup grows with problem size up to 3.9x at 8M variables.

PyPSA SciGrid-DE (realistic power system model, 24–1000 snapshots)

Consistent 2.5–2.7x speedup across all sizes, reaching 7.0s → 2.7s at 2.5M variables / 6M constraints.

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Unit tests for new features were added (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

Extract _format_and_write() helper that uses lazy().collect(engine="streaming") with automatic fallback, replacing 7 instances of df.select(concat_str(...)).write_csv(...).

Replace the vertical concat + sort approach in Constraint.to_polars() with an inner join, so every row has all columns populated. This removes the need for the group_by validation step in constraints_to_file() and simplifies the formatting expressions by eliminating null checks on coeffs/vars columns.

…r short DataFrame - Skip group_terms_polars when _term dim size is 1 (no duplicate vars) - Build the short DataFrame (labels, rhs, sign) directly with numpy instead of going through xarray.broadcast + to_polars - Add sign column via pl.lit when uniform (common case), avoiding costly numpy string array → polars conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…e vars Check n_unique before running the expensive group_by+sum. When all variable references are unique (common case for objectives), this saves ~31ms per 320k terms. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace np.unique with faster numpy equality check for sign uniformity. Eliminate redundant filter_nulls_polars and check_has_nulls_polars on the short DataFrame by applying the labels mask directly during construction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Guard against IndexError when sign_flat is empty (no valid labels) by checking len(sign_flat) > 0 before accessing sign_flat[0]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…for duplicate (labels, vars) pairs before calling group_terms_polars. Use it in both Constraint.to_polars() and LinearExpression.to_polars() to avoid expensive group_by when terms already reference distinct variables

FabianHofmann · 2026-02-02T13:42:58Z

Wonderful @FBumann ! This is very much welcome!

FBumann · 2026-02-02T14:38:44Z

@FabianHofmann Should a fix the codecov stuff?

FabianHofmann

nice one @FBumann ! happy to pull this in! should we remove the dev-script? thanks for the transparency on the benchmarks

FabianHofmann · 2026-02-06T10:54:47Z

dev-scripts/benchmark_lp_writer.py

@@ -0,0 +1,388 @@
+#!/usr/bin/env python3


just came to my mind, that a uv runnable script for dev-scripts would actually be super nice. not needed at all, just for inspiration

Ill keep that in mind next time. Im also using uv. If linopy is too, ill do it the next time ;)

FabianHofmann · 2026-02-06T11:10:33Z

linopy/io.py

+    kwargs: Any = dict(
+        separator=" ", null_value="", quote_style="never", include_header=False
+    )
+    try:


do you think this is needed? is the lazy operation still a bit flaky? happy to keep it in if you think it is safer

Im not sure if you meant the kwargs themselves: I moved them into the method call for readability

If you meant the try Except:
The streaming engine is stable and recomended (see https://pola.rs/posts/polars-in-aggregate-dec25/)
If we raise the lb of the polars version, we can remove the fallback i think (>=1.31.0, where the old one was removed)

Sorry, the selection was bad. I meant the whole fallback mechanism

Further: it's officially recommended, has transparent fallback built-in (no exceptions), and is on track to become the default

I pushed a commit that did that. I can revert if you want to

Wonderful! Let's pull this in then

No need to revert, but as you prefer

Im sure this is stable with the polars >=1.31 !
Lets merge this

FBumann · 2026-02-06T14:51:02Z

nice one @FBumann ! happy to pull this in! should we remove the dev-script? thanks for the transparency on the benchmarks

I removed it. I think it would be great to keep such benchmarks, but ill discuss it in #567 instead
@FabianHofmann

FBumann and others added 8 commits January 31, 2026 21:06

perf: use Polars streaming engine for LP file writing

86232e8

Extract _format_and_write() helper that uses lazy().collect(engine="streaming") with automatic fallback, replacing 7 instances of df.select(concat_str(...)).write_csv(...).

fix: log warning with traceback when Polars streaming fallback triggers

b1e9864

fix: missing space in lp file

d15ff40

fix: handle empty constraint slices in sign_flat check

0b413dd

Guard against IndexError when sign_flat is empty (no valid labels) by checking len(sign_flat) > 0 before accessing sign_flat[0]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FBumann mentioned this pull request Jan 31, 2026

perf: lp write speed #562

Closed

4 tasks

FBumann and others added 5 commits January 31, 2026 22:17

docs: add LP write speed improvement to release notes

9f35550

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bench: add LP write benchmark script with plotting

1896eee

bench: larger model

68f1adc

Add variance to plot

04c4bea

FBumann changed the title ~~perf: speed up LP file writing (2-2.7x on large models)~~ perf: speed up LP file writing (2.5-3.9x on large models) Jan 31, 2026

FBumann changed the title ~~perf: speed up LP file writing (2.5-3.9x on large models)~~ perf: speed up LP file writing (2.5-3.9x on large models, no regressions on small) Jan 31, 2026

FBumann added 3 commits February 1, 2026 00:43

test: add coverage for streaming fallback and maybe_group_terms_polars

3f52fef

fix: mypy

3d4a815

fix: mypy

0dbe488

FBumann mentioned this pull request Feb 2, 2026

Feat/benchmarks #567

Open

4 tasks

FabianHofmann reviewed Feb 6, 2026

View reviewed changes

FBumann added 3 commits February 6, 2026 15:14

Move kwargs into method for readability

a12c824

Remove fallback and pin polars >=1.31

f76d6c7

Remove the benchmark_lp_writer.py

ee889a3

FabianHofmann merged commit 36b15c5 into PyPSA:master Feb 6, 2026
25 of 26 checks passed

perf: speed up LP file writing (2.5-3.9x on large models, no regressions on small) #564

perf: speed up LP file writing (2.5-3.9x on large models, no regressions on small) #564

Uh oh!

Conversation

FBumann commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this Pull Request

Performance optimizations

Bug fixes

Benchmark results

Synthetic model (2×N² vars, 2×N² constraints)

PyPSA SciGrid-DE (realistic power system model, 24–1000 snapshots)

Checklist

Uh oh!

FabianHofmann commented Feb 2, 2026

Uh oh!

FBumann commented Feb 2, 2026

Uh oh!

FabianHofmann left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FBumann commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Jan 31, 2026 •

edited

Loading

FBumann commented Feb 6, 2026 •

edited

Loading